CONCENTRATION FOR NONCOMMUTATIVE POLYNOMIALS IN 

RANDOM MATRICES 
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Abstract. We present a concentration inequality for linear functionals of noncommu- 
tative polynomials in random matrices. Our hypotheses cover most standard ensembles, 
including Gaussian matrices, matrices with independent uniformly bounded entries and 
unitary or orthogonal matrices. 



1. Introduction 

The starting point of this paper was an inquiry of W. Bryc concerning almost sure conver- 
gence for certain non-Gaussian matrix models in free probability. Almost sure convergence 
questions often reduce to concentration inequalities, which may be interesting in their own 
right, and our purpose is to present one such inequality. 

Our approach is as follows. We start by defining the convex concentration property 
(CCP) of normed-space-valued random variables. When specialized to random matrices, 
the class CCP contains most standard ensembles, in particular the (appropriately nor- 
malized) Wigner-type matrices with independent bounded entries that were the object of 
Bryc's inquiry. Then we state and prove a concentration inequality for noncommutative 
polynomials in independent random matrices verifying the CCP. 

This approach is inspired by the results of M. Talagrand [28^ [29l [30] on concentration 
of measure in product spaces. These tools were first adapted to the random matrix con- 
text by Guionnet and Zeitouni in [llj and by Krivelevich and Vu in [17J, with subsequent 
applications in [2] [21]. However, various features of the present setup (noncommutativity, 
non-selfadjointness, the absence of the Lipschitz property in polynomials of degree greater 
than 1) do not fit into the standard framework and, consequently, a few additional tricks 
will be required. 

2. Convex concentration property 

We say that a random vector X in a normed space V satisfies the (subgaussian) convex 
concentration property (CCP), or is in the class CCP, if 

(1) F[\fiX)-MfiX)\>t]<Ke-^'" 

for every t > and every convex 1-Lipschitz function / : y — )• M, where K,k > are 
constants (parameters) independent of / and t, and M denotes a median of a random 
variable. Even though not explicitly defined, this property already made an appearance 
in [28] . The class CCP enjoys various stability properties, for example if X,Y satisfy the 
CCP, so does their concatenation {X, Y) (as follows from the proof of [191 Proposition 
1.11]). Clearly, various generalizations of the concept are possible. For example one may 
consider tail behaviors other than subgaussian, or allow other classes of test functions /; 
see, e.g., [l]. 
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While the subgaussian tail condition in ([T]) may appear stringent, it is verified by many 
natural classes of multivariate distributions. For example, if V = and the components 
Xi are independent normal random variables with uniformly bounded variances, or if the 
random variables {Xi — EXj) are uniformly bounded (E stands for the expected value of a 
random variable), then X satisfies the CCP. Examples with dependent components include 
X uniform on \/l^S^~^, or with a density proportional to e~"*^^\ where the Hessian of u 
verifies D^u > cl, c > 0. See [19j for multiple proofs of all these statements and much 
more information, and [Ij for a discussion of various fine points concerning the class CCP. 
Here we will just mention that the validity of the first example is a consequence of Borell- 
Sudakov-Tsirelson Gaussian isoperimetric inequality, the second one is the primary instance 
of Talagrand's approach to concentration on product spaces, the third one follows from 
Paul Levy's spherical isoperimetric inequality, and the last is a consequence of the theory 
of logarithmic Sobolev inequalities. We emphasize that the common and crucial feature 
of all these examples, and of others that will follow, is dimension independence: while the 
parameters K,k in ^ may depend on the characteristics of the family in question (for 
instance, on the bound on variances implicit in the first example above, or on the value of c 
in the last example), they do not depend on the dimension of the underlying vector space. 

As is well-known and easy to check, a concentration inequality of the type ([T|) implies that 
the mean and median of f{X) differ by at most a constant (depending only on the param- 
eters K, K, see, e.g., [19^ Section 1.3] or [22', Proposition V.4]); it follows that concentration 
about the median is equivalent to concentration about the mean up to modification of the 
constants in ([T]). At different points in the results and proofs below it will be convenient to 
work with either the mean or the median. 



3. Matrix ensembles: the main result 

We denote by M„ the space of n x n complex matrices and by M^"" its (real vector) 
subspace of Hermitian matrices, and by ||^||p := (tT{A* A)p/'^Y^^ the Schatten p-norm of a 
matrix A; the limiting case p = oo corresponds to the operator (or spectral) norm, while 
p = 2 leads to the Hilbert-Schmidt (or Frobenius) norm. We also denote by ||-||p the Lp-norm 
of a (real or complex) random variable, or the £p-norm of a vector in or C^. Below 
and in what follows C,Ci,C' ,c etc. stand for positive numerical constants, whose value 
may change from line to line. Similarly (for example) Cd^m will denote a positive constant 
which may depend on the parameters d and m, but not on the underlying dimension. Such 
constants will in general depend implicitly on the parameters K,k in ^ and, if applicable, 
on other constants appearing in the hypotheses of a particular statement; this dependence 
will be straightforward to make explicit but for the sake of simplicity we have mostly not 
chosen to do so here. 



Theorem 1. Let Xi, . . . ,Xm G Mn be independent centered random matrices which satisfy 
the convex concentration property (with respect to the Hilbert-Schmidt norm on Mn) and 
let d > 1 be an integer. Let P be a noncommutative ^-polynomial in m variables of degree 
at most d, normalized so that its coefficients have modulus at most 1. Define the complex 
random variable 
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Then, for t > 0, 

¥[\Zp-EZp\ >t]< Cm,de^p[-Cm,dmm{t\nt'^/'^}]. 

The conclusion holds also for non-centered random matrices if — when d>2 — we assume 
that ||EXj||2(^_i) < Cn'^/2{'i-i) for all j. 



It is a standard observation that, by integration by parts on the one hand and the 
Bienayme-Chebyshev-Markov inequaUty on the other hand, a tail bound as in Theorem [1] 
is equivalent to a bound on the growth of Lp-norms. 

Corollary 2. Let Zp be as in TheoremUl Then for q > 1, 

\\Zp - EZp\\^ < Cmax j^g, (^)'^'} • 

Remarks: 

1. The hypotheses of Theorem [1] cover Wigner-type matrices with independent Gaussian 
or independent bounded entries, but not arbitrary independent subgaussian entries 
(see [1] and its references; note that CCP clearly implies that the entries are subgaus- 
sian). However, independent entries satisfying a logarithmic Sobolev inequality, or 
more generally a quadratic transportation cost inequality, are covered (see [191 Chap- 
ters 5-6]). Moreover, the hypotheses also cover many cases with dependent matrix 
entries. The most notable are the following: 

(a) Xj drawn from an orthogonal or unitary ensemble, that is with a density w.r.t. 
Lebesgue measure on M^" proportional to e~ in the case that n : M — t- M 
satisfies u" > c > 0. (This again follows from the theory of logarithmic Sobolev 
inequalities.) Ensembles of this form are widely studied in the literature (see, 
e.g., [8]); in the context of nuclear physics this is a more natural class than that 
of Wigner matrices. 

(b) Xj such that n~^^'^Xj is uniformly distributed in the (special) orthogonal or 
unitary group (see [22l Section 6] or [191 Section 2.1]). 

(c) Xj uniformly distributed on the (Hilbert-Schmidt) sphere of M*" of radius 
^ n{n+i) ^ ^j^^ ^^^^ or complex case, respectively; or uniformly dis- 
tributed on the sphere of radius n (In fact, any 0(n) radii would do, but the 
exact values we cite here appear in a natural way.) 

2. A perhaps more natural way to state the bound on EXj in the non-centered case (if 
each Xj is Hermitian) is 

^ \2{d-l) 

tr ( E— 4 < Cn. 

In / 



A slightly stronger simple hypothesis is ||EXj||^ < C\/n. 

It is not strictly necessary that the Xj be independent, only that the joint distri- 
bution of (Xi, . . . ,Xm) G ©j=i satisfy the convex concentration property, with 
constants that may depend on m. 

When d> 2, it suffices for the proof to assume that Xj satisfies the convex concen- 
tration property with respect to the Schatten norm ||-||^ on M„, but it is not clear 
whether this is a useful observation. 
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4. The background and the consequences 



Here is a consequence of Theorem [T] in the spirit of the original inquiry of Bryc. For 
simphcity, we state it in the real case only. 

Corollary 3. Let Xi, . . . , Xm, P and Zp he as in TheoremUi and assume further that, 
for each j, Xj is real symmetric and its upper- diagonal entries are independent and of unit 
variance. Then, almost surely, 



where ai,a2, ■ ■ ■ ,am are free semicircular elements in a noncommutative probability space 



The connection between random matrices and free probability was established in the 
seminal paper [33], where the weaker convergence n~^KZp — T{P{ai,a2, ■ ■ ■ ,0^)) was 
shown in the Gaussian case (we refer to \34:\ [TO] for more background on free probability) . 
This was generalized to (in particular) other Wigner-like ensembles in [9] , and strengthened 
in various ways in [El [271 HI [Ml EH EO] . 

The fact that the weaker convergence (of expected values) in combination with concen- 
tration (which was known for Gaussian and some other classical ensembles) implies almost 
sure convergence was essentially folklore (see \13 \ [5]): the deviation of n^^Zp from its 
expected value has a tail that decays (at least) exponentially in n, hence the Borel-Cantelli 
lemma applies. Note that rescaling by is appropriate since the noncommutative prob- 
ability context calls for the normalized trace n~^tr. 

The same argument applies to any other ensemble which verifies the CCP and for which 
the limit object — in the (weak) noncommutative probability sense — exists. On the other 
hand, results along the lines of Corollary [3] can also be proved without Theorem [H and in 
particular under weaker assumptions than exponential concentration. Theorem 2 of |25] 
proves what amounts to the conclusion of Corollary [3] for Wigner matrices with i.i.d. entries 
with bounded fourth moments; see [25] for references to earlier results proved under stronger 
assumptions. In addition, concentration inequalities for some noncommutative functionals 
of random matrices — but not polynomials — appeared already in (Theorem 1.9; the 
entries are required to satisfy logarithmic Sobolev inequality). 

Finally, let us point out that there is a fairly extensive literature on the tail behavior of 
"higher order chaoses" (i.e., polynomials) in classical probability, i.e., without focus on the 
issues related to the matrix structure or noncommutativity, for example [71 [IH H] . There 
are also applications of concentration of polynomials to combinatorics |15 1 116 1 [35]. 



Theorem [T] will be deduced from the special case of a power of a single Hermitian random 
matrix. 

Proposition 4. Let X G M^"" be a random Hermitian matrix which satisfies the convex 
concentration property (with respect to the Hilbert- Schmidt norm on M^""), let d > 1 be an 




5. The proof: a special case 



integer, and suppose 




Then for t > 



P tr ( — ) -Mtrl— ) >t < Cexp[-min{c'^t^cn; 
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The essential idea in the proof of this concentration inequahty is of course to apply the 
CCP to the functional A i— )• tr A'^, but there are two obvious difficulties with this approach. 
One is that this functional is not convex if d is odd and d > 3, and the convexity is not 
entirely trivial when d is even; this technicality is readily dealt with by using a classical 
convexity lemma and (in the odd case) a simple decomposition trick. The second, more 
fundamental problem is that when d > 2 this functional is not Lipschitz (in fact, not even 
uniformly continuous). However, it is locally Lipschitz in a way which is readily quantified, 
so that a variation of standard truncation arguments can be applied. Extra care is needed 
here to show that the truncation procedure can be made to preserve the convexity of the 
functional and its Lipschitz constant, and to control the effect of the truncation on the 
median. The following folklore result will be helpful. 

Lemma 5. Let V be finite- dimensional normed space, K C V an open convex set, and 
F : K ^M. a convex Lipschitz function. Then there exists a function F : V —^M. such that 

• F is convex and F\k = F (i.e., F is a convex extension of F); 

• F is pointwise minimal among all convex extensions of F; and 

• F is Lipschitz, and its Lipschitz constant is the same as that of F. 

Proof. For y £ K, recall that (cf. [26., Section 23]) 

dF{y) = {cl)eV*\ F{x) > F{y) + (^(x - y)] 

is the subdifferential of at x (nonempty because F is convex), so that 

(2) F(x) = sup {F{y) + ct>{x - y) \ <i) d dF{y), y e K} 

for every x £ K. Moreover, the Lipschitz constant of F (on K) is 

sup{||,^|| \(t>edF{y), yeK}, 

(cf. |26l Corollary 13.3.3]). This implies that the supremum in ([2]) is finite also for x ^ K 
and thus defines an extension F : 1/ — t- M. The assertions of the lemma follow easily from 
this definition. □ 



Proof of Proposition^^ The case d = 1 is an immediate consequence of the CCP ([T]), so we 
will assume from now on that d > 2. 

Let F : M^"" R be given by 

n 

F{A)=tIA'' = Y,^^iA)'', 

1=1 

where Xi{A) are the eigenvalues of A in, say, nonincreasing order. A classical lemma of 
matrix analysis (see e.g. [21 Lemma 4.4.12]) states that a functional A i— )• ti (j){A) is convex 
whenever : R — )• M is convex; hence in particular our F is convex when d is even. If c? > 3 
is odd, then we can write F{A) = F^{A) — F~{A), where 

n 

F^{A) = Y,^^{A)^. 
1=1 

Here x+ = max{0, x} and x_ = max{0, — x}. Since both the functions x i— )• x^ are convex, 
F^ : — >• R are both convex. In the rest of this proof, for clarity of exposition, we will 
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proceed as if d is even. The odd case is handled in the same way by considering and 
F~ separately, then deducing the concentration of 

F{X) -EF(X) = -EF+(X)) - {F'[X) -EF-(X)) 

from the concentration of each summand and the triangle inequality. 

Let / : — )• R be given by f{x) = Yll^=i ^f- Another classical lemma of matrix analysis 
(see e.g. [3l Lemma 2.1.19 and Remark 2.1.20]) states that the map A i— )• {Xi(A), . . . , Xn{A)) 
is 1-Lipschitz from M^*^ with the Hilbert-Schmidt norm to R" with the standard Euclidean 
norm. The local Lipschitz behavior of F can therefore be controlled via the local Lipschitz 
behavior of /, for which we compute 



|V/(x) 



. a ^JL- —a ||-i||2(d_i) 
\ i=l 



We now describe our truncation procedure. For each a > 0, we set 

Ka = {A€ I P||2(,_i) < a}; 

then F\Ka is (da'^~^)-Lipschitz. At this point we appeal to Lemma [5] to obtain convex 
((ia'^~^)-Lipschitz extensions Fa : M^"" — )• R to which the CCP applies. Moreover, since 
{Ka} is a nested family of open convex sets whose union is M™, the minimality property 
from Lemma [5] implies that, for each A G M^"", Fa{A) increases to F{A) as a — )• oo. 

The other necessary ingredient for the truncation-type argument is an upper bound on 
the probability of the event that X ^ Ka- For this, we begin with a standard discretization 
argument to bound the operator norm of {X — EX). [The argument is neither optimal 
(better constants are possible) nor the quickest (for an expert in probability, appealing to 
comparison theorems for subgaussian processes [31] would yield the result much faster), but 
we include it for the sake of completeness.] Let ]Sf be a |-net in the unit sphere of : 
with |>J"| < 7^" (see [22l Lemma 2.6] or [321 Lemma 2]), and for A G M^"" define 



2n 



PIU = sup\{Av,v)\ . 

Then < 3 \\A\\j^ by [32, Lemma 4]. 

For each u G 5'"""'^, A i— )• |(^m, m)| is a convex and 1-Lipschitz function M*" — )• R, so by 
the CCP (dD, 

P[||X - EX||^ > t] < ¥[\\X - ¥.X\\j^ > t/3] 

< ^F[\{{X -EX)v,v)\ > t/3] < CT^^e"'^*'. 

From this it follows that M ||X - EX\\^ < C^/n. Since |H|^ < IHI2, the CCP ^ applies 
to the function /(^) = ||^||oo ^^"^ so E||X — EX||^ < C\/n as well. (Alternatively, this 
latter estimate follows by combining the inequality above with integration by parts.) 

We also have the elementary estimate (a very weak consequence of CCP) 



EIIX-EXII2 < YEIIX-EXII2 = /^E|xij -Exijf < Cn. 
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From the above estimates and Holder's inequality, we obtain that for p > 2, 



E\\X -EX\\p < E(||X 



EX 



|2/p 



X -EX 



a-2/p 



< E\\X -EX\ 



E\\X -EX\ 



,1-2/p 



Specifying p = 2{d — 1) yields 



E||X||2(,„,)<E||X-EX||2(,_,) 



lEXl 



2{d 



(It is here that our hypothesis for non-centered random matrices enters into play, and where 



the form of the hypothesis is clarified.) Now 



\2{d~l) 



< 



12' 



SO the CCP ([I]) applies to the 



function f{A) = ||^|| 2(^-1)- This implies finally that for a = n'^/'^('^ ^)b, b > C, 



\x\ 



2(d-l) 



1) >a] <Cexp[-cn'^/('^-i)62]. 



We are now ready to carry out the argument to bound the tails of — MF(X)) by — 

in particular — appropriately choosing the truncation level a. Recall that '■ — )• M are 
the functions provided by Lemma [5l The monotonicity in a of i^a(^) implies that MFa(X) 
increases in a to MF(X). Letting a = Cin'^/^^'^"^) and applying the CCP ([T]) to Fa we 
obtain 

F[F{X) > MFaiX) + s]= F[{Fa{X) > MF,(X) + s) and {X G Ka)] 

+ F[{F{X) > MFaiX) + s) and {X ^ Ka)] 
< r[Fa{X) > MFaiX) + s]+F[X(^ Ka] 
„2 



< C exp 



Cexp -cn'^/('^-i)C7f 



Therefore if Ci is chosen large enough (independently of n and d), then 

P[F(X) > MFaiX) + CidC^-^n'^/^] < ^ 

for some C2 > 0, and so MF(X) < MFaiX) + dC^n'^/'^. Since MFaiX) increases monoton- 
ically with a, we obtain 

\MFiX) - MFaiX)\ < dCin'^/^ 

for every a > Cin'^^'^^'^~^\ (This is the point at which it is most convenient to be working 
with the median instead of the mean, since for a fixed a the bound we get for P[|F(X) — 
MFaiX)\ > s] is not integrable.) 

Now set a = 6n''/2('^-i) with b > Ci. For s > 2dC^n'^/'^, by applying the CCP ^ to Fa 
again, 

P[|F(X) - MF(X)| >s]= F[\FaiX) - MFiX)\ > s) and [X G Ka)] 

+ P[|F(X) - MFiX)\ > s) and {X ^ K^] 
< F[\FaiX) - MFaiX)\ >is- dCgV/^)] + P[X ^ Ka] 
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< C exp 

< C exp 



„2 1 



+ C exp 



+ C exp 



-cn 



_Cn'^/(d-l)52 
d/(d-l)j2 



If s < Cfdn'^'/2{d-i) 

and 6 = Ci, then the first term in the last estimate dominates the 
second. If s > Cfdn'^^^'^^'^~^\ then setting h = d~^n~'^^'^'''^~^^ s^^'^ results in both exponents 
being of the same order, and we obtain 



'[\F{X) - MF{X)\ >s]< Cexp 



mm 



, CS 



2/d 



for all s > 2dC^n'^^'^. The inequality above is vacuously true (with appropriately chosen 
constants) if s < 2dC^n'^/'^ . Finally, substituting s = n'^l'^t yields the bound in the statement 
of the proposition. □ 

Parts of the analysis of this section can be performed for functionals more general than 
traces of powers, e.g., A i— )• tr(/)(A) for (/> : M — )• M a convex Lipschitz function as already 
considered in [llj. In an even less restrictive framework, by replacing the convexity lemma 
[31 Lemma 4.4.12] used above and in [TI] with the more general result of j^, one can consider 
functionals of the form A i— >■ /(Ai(^), . . . , A„(A)) for a symmetric, convex, Lipschitz function 
/ : M" ^ M; see [HI Corollary 8.23]. 

6. The general case: polarization and other tricks 

To deduce a version of Proposition |4] for non-Hermitian matrices, we use the following 
polarization identity. 



Lemma 6. For any A,Bg Mn, 

A'^ = 
In particular, 

1 

j^d ^ ^ ^irijd/id+l) ^g-7rij7(d+l)^ _^ gTijr7(d+l)^*^rf 

^ ^ i=o 

Proof. Expanding the sum, there are matrices M^, k = 0, . . . ,d with Mq = A'^ such that 

d 
k=0 



The {d+1) X (d+l) Fourier matrix 
relations yields 



1 ^2TTijk/{d+l) 
/d+T 



j,k=0 



is unitary, so inverting the above 



1 

^ ^ ^-27Tijk/{d+l) _^ g27rij7(d+l) 



j=0 
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Corollary 7. Let X G M„, he a random matrix which satisfies the convex concentration 
property (with respect to the Hilhert- Schmidt norm on Mn), let d > 1 he an integer, and 
suppose — when d > 2 — that \\W.X\\^^^_-^-^ < cn'^/^Cd-i) _ y/^g^ ^ > 



tr 



X 



Etr 



X 



> t 



< 



C{d + 1) exp [- minjc'^t^ cnt^''^]] . 



Proof. Observe that for any 6* G M, ^ H> e~*^A + e*^A* is a 2-Lipschitz map Mn M™. 
Thus Yq = e~^^X + e*^X* satisfies the hypotheses of Proposition HI As remarked earher, 
in the conclusion of Proposition [H the median may be replaced by the mean. Set 6j = 

TTj/{d + l) for i = 0,1,...,^^. Then, by Lemma El (^)' = ^ ^^t^ e*'^^^/(^+i) (^)'' 
and hence, by Proposition 21 



tr 



X 



Etr 



X 



> t 


< p 







1 

— Y 



d+ . 
< l)supP 



n 



tr 



Etr 



Etr 



Ye 



> t 



> t 



< C{d + 1) exp [- minjc'^t^ cnt^/'^}] . 



□ 



Proof of Theorem [11 By the triangle inequality, it suffices to consider the case when P 
is a noncommutative *-monomial. (Note that for fixed m and d there are, up to scalar 
multiples, only finitely many distinct noncommutative *-monomials of degree at most d in 



m variables.^ 
then define 



Write P{xi, . . . , Xm) = Vi ■ ■ - yd, where each yj is equal to some Xk or x^, and 



X 



Yi 




Y2 



Y, 



Yd-i 




analogously. It is easy to verify that 



YiY2---Yd 







Y2Y3---YdYi 



YdYi ■■■Yd- 



so that trX'^ = dtr P{Xi,^ ■ ■ ,Xm)- Furthermore, X satisfies the convex concentration 
property on M^n, with constants that may now depend on d (cf. [19^ Proposition 1.11]). 
The theorem now follows by applying Corollary [7] to X. □ 
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